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ABSTRACT 

Online social media are complementing and in some cases replac- 
ing person-to-person social interaction and redefining the diffu- 
sion of information. In particular, microblogs have become crucial 
grounds on which public relations, marketing, and political battles 
are fought. We introduce an extensible framework that will enable 
the real-time analysis of meme diffusion in social media by mining, 
visualizing, mapping, classifying, and modeling massive streams 
of public microblogging events. We describe a Web service that 
leverages this framework to track political memes in Twitter and 
help detect astroturfing, smear campaigns, and other misinforma- 
tion in the context of U.S. political elections. We present some 
cases of abusive behaviors uncovered by our service. Finally, we 
discuss promising preliminary results on the detection of suspicious 
memes via supervised learning based on features extracted from the 
topology of the diffusion networks, sentiment analysis, and crowd- 
sourced annotations. 
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I. INTRODUCTION 

Social networking and microblogging services reach hundreds 
of million users and have become fertile ground for a variety of 
research efforts, since they offer an opportunity to study patterns 
of social interaction among far larger populations than ever before. 
In particular, Twitter has recently generated much attention in the 
research community due to its peculiar features, enormous popu- 
larity, and open policy on data sharing. 

Along with the growth in reach of microblogs, we are also ob- 
serving the emergence of useful information that can be mined 
from their data streams [4 48 11]. However, as microblogs be- 
come valuable media to spread information, e.g., for marketeers 
and politicians, it is natural that people find ways to abuse them. 
As a result, we observe various types of illegitimate use, such as 
spam [52 21 49]. In this paper we focus on one particular type of 
abuse, namely political astroturf — campaigns disguised as spon- 
taneous, popular "grassroots" behavior that are in reality carried 
out by a single person or organization. This is related to spam but 
with a more specific domain context, and with potentially larger 
consequences. The importance of political astroturf stems from the 
unprecedented opportunities created by social media for increased 
participation and information awareness among the Internet-con- 
nected public (T][8j[2j. 

Online social media tools have played a crucial role in the suc- 
cesses and failures of numerous political campaigns and causes, 
from the grassroots organizing power of Barack Obama's 2008 
presidential campaign, to Howard Dean's failed 2004 presidential 
bid and the first-ever Tea Party rally [46 39 51]. Moreover, tradi- 
tional media pay close attention to the ebb and flow of communi- 
cation on social media platforms, and with this scrutiny comes the 
potential for these discussions to reach a far larger audience than 
simply the social media users. 

While some news coverage of social media may seem banal 
and superficial, their focus is not without merit. Social media, 
such as Twitter, often enjoy substantial user bases with partici- 
pants drawn from diverse geographic, social and political back- 
grounds [29 3] Moreover, the user-as-information-producer model 
provides researchers and news organizations alike a means of in- 
strumenting and observing, in real-time, a large sample of the na- 
tion's political participants. So relevant is this discursive space, 
in fact, that the Library of Congress has recently undertaken the 
project of archiving a complete record of the discourse produced by 
Twitter users (41) . Despite the benefits associated with increased 
information availability and grassroots political organization, the 
same structural and systematic properties that enable Twitter users 



to quickly sound the alarm about a developing emergency 1 16 ] can 
also be leveraged to spread lies and misinformation. 

Unlike traditional news sources, social media provide little in 
the way of individual accountability or fact-checking mechanisms, 
meaning that catchiness and repeatability, rather than truthfulness, 
can function as the primary drivers of information diffusion in these 
information networks. While flame wars and hyperbole are hardly 
new phenomena online, Twitter's 140-character sound-bytes are 
ready-made headline fodder for the 24-hour news cycle. More than 
just the calculated emissions of high-profile users like Sarah Palin 
and Barack Obama, consider the fact that several major news orga- 
nizations picked up on the messaging frame of a viral tweet relating 
to the allocation of stimulus funds, succinctly describing a study 
of decision making in drug-addicted macaques as "Stimulus $ for 
coke monkeys" [47) . 

While the "coke monkeys" meme developed organically from 
the attention dynamics of thousands of users, it illustrates the pow- 
erful and potentially detrimental role that social media can play in 
shaping the public discourse. As we will demonstrate, a motivated 
attacker can easily orchestrate a distributed effort to mimic or ini- 
tiate this kind of organic spreading behavior, and with the right 
choice of inflammatory wording, influence a public well beyond 
the confines of his or her own social network. 

Here we describe a system to analyze the diffusion of informa- 
tion in social media, and in particular to automatically identify and 
track orchestrated, deceptive efforts to mimic the organic spread of 
information through the Twitter network. The main contributions 
of this paper are: 

• The introduction of an extensible framework for the real-time 
analysis of meme diffusion in social media by mining, visu- 
alizing, mapping, classifying, and modeling massive streams 
of public microblogging events. (§[3} 

• The design and implementation of a Web service that lever- 
ages our framework to track political memes in Twitter and 
help detect astroturfing, smear campaigns, and other misin- 
formation in the context of U.S. political elections. (§[4} 

• A description and analysis of several cases of abusive behav- 
ior uncovered by our service. (§[5} 

• Promising preliminary results on the detection of suspicious 
memes via supervised learning, which achieve around 90% 
accuracy based on features extracted from the topology of the 
diffusion networks, sentiment analysis, and crowdsourced 
annotations. (§[6} 

2. RELATED WORK AND BACKGROUND 
2.1 Information Diffusion 

The study of opinion dynamics and information diffusion in so- 
cial networks has a long history in the social, physical, and compu- 
tational sciences (3^[l5l5}[T3}[6j[3T][32) . While usually referred to 
as 'viral' (38) , the way in which information or rumors diffuse in a 
network has several important differences with respect to infections 
diseases. Rumors gradually acquire more credibility and appeal as 
more and more network neighbors acquire them. After some time, 
a threshold is crossed and the rumor becomes so widespread that 
it is considered as 'common knowledge' within a community and 
hence, true. In the case of information propagation in the real world 
as well as in the blogosphere, the problem is significantly com- 
plicated by the fact that the social network structure is unknown. 
Without explicit linkage data investigators must rely on heuristics 



at the node level to infer the underlying network structure. Gomez 
et al propose an algorithm that can efficiently approximate link- 
age information based on the times at which specific URLs appear 
in a network of news sites [19]. However, even in the case of the 
Twitter social network, where explicit follower/followee social re- 
lations exist, they are not all equally important (26) . Fortunately 
for our purposes, Twitter provides an explicit way to mark the dif- 
fusion of information in the form of retweets. This metadata tells 
us which links in the social network have actually played a role the 
diffusion of information. 

Additionally, conversational aspects of social interaction in Twit- 
ter have recently been studied [12., 24 20]. For example, Mendoza 
et al examined the reliability of retweeted information in the hours 
following the 2010 Chilean earthquake [35]. They found that false 
information is more likely to be questioned by users than reliable 
accounts of the event. Their work is distinct from our own in that it 
does not investigate the dynamics of misinformation propagation. 
Finally, recent modeling work taking into account user behavior, 
user-user influence and resource virulence has been used to predict 
the spread of URLs through the Twitter social network (18). 

2.2 Mining Microblog Data 

Several studies have demonstrated that information shared on 
Twitter has some intrinsic value, facilitating, e.g., predictions of 
box office success (4) and the results of political elections [48]. 
Content has been further analyzed to study consumer reactions to 
specific brands | 28 ], the use of tags to alter content (25) , its relation 
to headline news [30], and on the factors that influence the proba- 
bility of a meme to be retweeted (45). Other authors have focused 
on how passive and active users influence the spreading paths [42]. 

Recent work has leveraged the collective behavior of Twitter 
users to gain insight into a number of diverse phenomena. Analy- 
sis of tweet content has shown that some correlation exists between 
the global mood of its users and important worldwide events [10], 
including stock market fluctuations (TT) . Similar techniques have 
been applied to infer relationships between media events such as 
presidential debates and affective responses among social media 
users [15]. Sankaranarayanan et al developed an automated break- 
ing news detection system based on the linking behavior of Twitter 
users [44], while Heer and Boyd describe a system for visualizing 
and exploring the relationships between users in large-scale social 
media systems [23]. Driven by practical concerns, others have suc- 
cessfully approximated the epicenter of earthquakes in Japan by 
treating Twitter users as a geographically-distributed sensor net- 
work g3). 

2.3 Political Astroturf and Truthiness 

In the remainder of this paper we describe a system designed 
to detect astroturfing campaigns on Twitter. As an example of 
such a campaign, we turn to an illustrative case study documented 
by Metaxas and Mustafaraj, who describe a concerted, deceitful 
attempt to cause a specific URL to rise to prominence on Twit- 
ter through the use of a distributed network of nine fake user ac- 
counts (36) . In total these accounts produced 929 tweets over the 
course of 138 minutes, all of which included a link to a website 
smearing one of the candidates in the 2009 Massachusetts special 
election. The tweets injecting this meme mentioned users who had 
previously expressed interest in the Massachusetts special election, 
being prime candidates to act as rebroadcasters. By this the initia- 
tors sought not just to expose a finite audience to a specific URL, 
but to trigger an information cascade that would lend a sense of 
credibility and grassroots enthusiasm to a specific political mes- 
sage. Within hours, a substantial portion of the targeted users re- 



tweeted the link, resulting in rapid spreading that was detected by 
Google's real-time search engine. This caused the URL in ques- 
tion to be promoted to the top of the Google results page for the 
query 'mart ha coakley' — a so-called Twitter bomb. This 
case study demonstrates the ease with which a focused effort can 
initiate the viral spread of information on Twitter, and the serious 
consequences this can have. 

Our work is related to the detection of spam in Twitter, which 
has been the subject of several recent studies. Grier et al. pro- 
vide a general overview of spam on Twitter (21), focusing on spam 
designed to cause users to click a specific URL. Grouping together 
tweets about the same URL into spam 'campaigns,' they find a min- 
imal amount of collusion between spammer accounts. Boyd et al. 
also analyze Twitter spam with respect to a particular meme (52) . 
Using a hand-classified set of 300 tweets, they identify several dif- 
ferences between spam and good user accounts, including the fre- 
quency of tweets, age of accounts, and their respective periphery in 
the social graph. Benevenuto et al. (7) use content and user behav- 
ior attributes to train a machine learning apparatus to detect spam 
accounts. They build a classifier that achieves approximately 87% 
accuracy in identifying spam tweets, and similar accuracy in de- 
tecting the spam accounts themselves. 

The mass creation of accounts, the impersonation of users, and 
the posting of deceptive content are all behaviors that are likely 
common to both spam and political astroturfing. However, polit- 
ical astroturf is not exactly the same as spam. While the primary 
objective of a spammer is often to persuade users to click a link, 
someone interested in promoting an astroturf message wants to es- 
tablish a false sense of group consensus about a particular idea. Re- 
lated to this process is the fact that users are more likely to believe 
a message that they perceive as coming from several independent 
sources, or from an acquaintance (27) . Spam detection systems 
often focus on the content of a potential spam message — for in- 
stance, to see if the message contains a certain link or set of tags. In 
detecting political astroturf, we focus on how the message is deliv- 
ered rather than on its content. Further, many of the users involved 
in propagating a successfully astroturfed message may in fact be 
legitimate users, who are unwittingly complicit in the deception, 
having been themselves deceived by the original core of automated 
accounts. Thus, existing methods for detecting spam that focus on 
properties of user accounts, such as the number of URLs in tweets 
originating from that account or the interval between successive 
tweets, would be unsuccessful in finding such astroturfed memes. 

In light of these characteristics of political astroturf, we need a 
definition that allows us to discriminate such falsely-propagated in- 
formation from organically propagated information that originates 
at the real grassroots. We thus decided to borrow a term, truthy, 
to describe political astroturf memes. The term was coined by co- 
median Stephen Colbert to describe something that a person claims 
to know based on emotion rather than evidence or facts. We can 
then define our task as the detection of truthy memes in the Twitter 
stream. Not every truthy meme will result in a viral information 
cascade like the one documented by Metaxas and Mustafaraj, but 
we wish to test the hypothesis that the initial stages exhibit common 
signatures that can help us identify this behavior. 

3. ANALYTICAL FRAMEWORK 

Social media analysis presents major challenges in the area of 
data management, particularly when it comes to interoperability, 
curation, and consistency of process. Due to diversity among site 
designs, data models, and APIs, any analytical tools written by re- 
searchers to address one site are not easily portable to another. To 
focus on the common features of all social media and microblog- 
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Figure 1: The Klatsch model of streaming social media events. 



ging sites, we developed a unified framework, which we call Klatsch, 
that makes it possible to analyze the behavior of users and diffusion 
of ideas in a broad variety of data feeds. The Klatsch framework 
is designed to provide data interoperability for the real-time anal- 
ysis of massive social media data streams (millions of posts per 
day) from sites with diverse structures and interfaces. To this end, 
we model a generic stream of social networking data as a series 
of events that represent interactions between actors and memes, as 
shown in Figure [T] 

In the Klatsch model, social networking sites are sources of a 
timestamped series of events. Each event involves some number 
of actors (entities that represent individual users), some number of 
memes (entities that represent units of information at the desired 
level of detail), and interactions among those actors and memes. 
For example, a single Twitter post might constitute an event in- 
volving three or more actors: the poster, the user she is retweeting, 
and the people she is addressing. The post might also involve a 
set of memes consisting of 'hashtags' and URLs referenced in the 
tweet. Each event can be thought of as contributing a unit of weight 
to edges in a network structure, where nodes are associated with 
either actors or memes. This is not a strictly bipartite network: ac- 
tors can be linked through replying or mentioning, and memes by 
concurrent discussion or semantic similarity. The timestamps asso- 
ciated with the events allow us to observe the changing structure of 
this network over time. 

3.1 Meme Types 

To study the diffusion of information on Twitter it is necessary to 
single out features that can be used to identify a specific topic as it 
propagates through the social substrate. While there exist many so- 
phisticated statistical techniques for modeling the underlying top- 
ics present in bodies of text (9] [14], the small size of each tweet 
and the contextual drift present in streaming data create significant 
complications (50| . Fortunately, several conventions shared among 
Twitter users allow us to avoid these issues entirely. We focus on 
the following features to identify different types of memes: 

Hashtags The Twitter community uses tokens prefixed by a hash 
(#) to label the topical content of tweets. Some examples 
of popular tags are #gop, tobama, and Idesen, marking 
discussion about the Republican party, President Obama, and 
the Delaware race for U.S. Senate, respectively. These are 
often called hashtags or #tags. 

Mentions A Twitter user can call another user's attention to a par- 
ticular post by including that user's screen name in the post, 
prepended by the @ symbol. These mentions can be used 
as a way to carry on conversations between users (replies), 




Figure 2: Example of a meme diffusion network involving three 
users mentioning and retweeting each other. The values of var- 
ious node statistics are shown next to each node. The strength 
s refers to weighted degree. 



or to denote that a particular Twitter user is being discussed 
(mentions). 

URLs We extract URLs from tweets by extracting strings of valid 
URL characters that begin with http : / / Honeycutt et ah 
suggest that URLs are most clearly associated with the trans- 
mission of information on Twitter (24) . 

Phrases Finally, we consider the entire text of the tweet itself to be 
a meme, once all Twitter metadata, punctuation, and URLs 
have been removed. 

Relying on these conventions we are able to focus on the ways 
in which a large number of memes propagate through the Twitter 
social network. 

3.2 Network Edges 

To represent the flow of information through the Twitter com- 
munity we construct a directed graph in which nodes are individual 
user accounts. An example diffusion network involving three users 
is shown in Figure [2] An edge is drawn from node A to B when 
either B is observed to retweet a message from A, or A mentions 
B in a tweet. The weight of an edge is incremented each time we 
observe an event connecting two users. In this way, either type of 
edge can be understood to represent a flow of information from A 
to B. Observing a retweet at node B provides implicit confirma- 
tion that information from A appeared in B's Twitter feed, while 
a mention of B originating at node A explicitly confirms that A's 
message appeared in B's Twitter feed. This may or may not be no- 
ticed by B, therefore mention edges are less reliable indicators of 
information flow compared to retweet edges. 

We determine who was replied to or retweeted not by parsing the 
text of the tweet, which can be ambiguous (as in the case when a 
tweet is marked as being a 'retweet' of multiple people). Rather, 
we rely on Twitter metadata that we download along with the text 
of the tweet, and which designates users as being the users replied 
to or retweeted by each message. Thus, while the text of a tweet 
may contain several mentions, we only draw an edge to the user 
who is explicitly designated as the mentioned user by the tweet 
metadata. Note that this is separate from our use of mentions as 
memes (§ |3.1| ), which we parse from the text of the tweet. 




Figure 3: The Truthy system architecture. 



4. TRUTHY SYSTEM ARCHITECTURE 

We built a system called Truthy ( jtruthy . indiana . edu| ». A 
general overview of the components of our system is shown in 
Figure [3] Truthy includes several components: a low-level sys- 
tem overseeing the collecting and processing of the raw data feeds 
from the Twitter API, the meme detection framework, The Klatsch 
framework responsible for computing key network statistics and 
layouts, and a Web based presentation framework that allows us to 
collect user input on which memes the community deems most sus- 
picious. These components are described next. Network statistics 
and community-generated annotations are the primary inputs to the 
classification apparatus discussed in §|6] 

4.1 Streaming Data Collection 

To collect meme diffusion data we rely on whitelisted access to 
the Twitter 'gardenhose.' The gardenhose provides detailed data on 
a sample of the Twitter corpus at a rate that varied between roughly 
4 million tweets a day near the beginning of our study, to around 8 
million tweets per day at the time of this writing. We distinguish 
here between the gardenhose and the firehose, the latter of which 
provides an unfiltered dump of all Twitter's traffic, but is only avail- 
able to users who purchase access. While the process of sampling 
edges (tweets between users) from a network to investigate struc- 
tural properties has been shown to produce suboptimal approxima- 
tions of true network characteristics (33) , we find that the anal- 
yses described below are able to produce accurate classifications 
of truthy memes even in light of this shortcoming. All collected 
tweets are stored in files at a daily time resolution. We maintain 
files both in a verbose JSON format containing all the features pro- 
vided by Twitter, and in a more compact format that contains only 
the features used in our analysis. This collection is accomplished 
by a component of our system that operates asynchronously from 
the others. 

4.2 Meme Detection 

A second component of our system is devoted to scanning the 
collected tweets in real time, by pulling data from the daily files 
described above. The task of this meme detection component (Fig- 
ure [4} is to determine which of the collected tweets are to be stored 
in our database and subjected to further analysis. Our goal is to col- 
lect only tweets (a) with content related to the political elections, 
and (b) of sufficiently general interest. We implemented a filtering 
step for each of these criteria, described below. 

To identify politically relevant tweets, we turn to a hand-curated 
collection of approximately 2500 keywords relating to the 2010 
U.S. midterm elections. This keyword list contains the names of 
all candidates running for U.S. federal office, as well as any com- 
mon variations and known Twitter account usernames. The collec- 
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Figure 5: The Klatsch framework architecture. 



Figure 4: Our meme detection and tracking system consists of 
three separate, asynchronous components — the tweet collec- 
tion, which downloads tweets and saves them to disk; the tweet 
filter, which determines tweets likely to relate to politics; and 
the meme filter, which identifies memes of significant general 
interest and saves them in the database. 



tion further contains the top 100 hashtags that co-occurred with the 
hashtags #tcot and #p2 (the top conservative and liberal tags, 
respectively) during the last ten days of August 2010. The motiva- 
tion for including explicit hashtags in the filter is not to ensure that 
these terms are tracked by the system (though this is a side effect), 
but rather to capitalize on the common behavior of Twitter users 
whereby they include chains of tags to identify multiple relevant 
topics of interest. This component, too, operates asynchronously. 
It is capable of processing tweets at a rate of about 10 times faster 
than our sampling rate, allowing it to easily handle bursts of traffic. 
We refer to this component as the tweet_f ilter. 

Simply including all the tweets at this step would have resulted 
in a proliferation of distinct memes, as it would have included as 
a meme any hashtag, URL, username, or phrase mentioned by any 
user even one time. We thus implemented a second stage of fil- 
tering designed to identify those tweets containing memes of suf- 
ficiently general interest. We refer to this stage of filtering as the 
meme_f ilter. 

The meme_f ilter, like the t we et_f ilter, reads tweets in 
real time. However, since tweets in the gardenhose are not guaran- 
teed to be in strict temporal order, the meme_f ilter inserts all 
tweets read into a priority queue that orders them by their times- 
tamp. Tweets are then processed in the order that they are removed 
from the queue. This does not guarantee that tweets will be read 
in sorted order, but greatly decreases the number of out-of-order 
tweets — for a priority queue of size n, any tweet less than n 
places out of order will be correctly ordered. We found empirically 
that n — 1000 decreased out-of-order tweets to manageable lev- 
els. It is necessary to present tweets in-order to subsequent layers, 
to make maintenance of the sliding activation window (described 
next) more efficient. Thus any out-of-order tweets remaining after 
this step are discarded. 

The meme_f i Iter's goal is to extract only those tweets that 
pertain to memes of significant general interest. To this end, we 
extract all memes (of the types described in § |3.1| > from each in- 
coming tweet, and track the activation over the past hour of each 
meme, in real time. If any meme exceeds a rate threshold of five 
mentions in a given hour it is considered 'activated;' any tweets 
containing that meme are then stored. If a tweet contains a meme 
that is already considered activated due to its presence in previ- 
ous tweets, it is stored immediately. When the mention rate of the 
meme drops below the activation limit, it is no longer considered 



activated and tweets containing the meme are no longer automati- 
cally stored. Note that a tweet can contain more than one meme, 
and thus the activation of multiple memes can be triggered by the 
arrival of a single tweet. We chose a low rate threshold with the 
understanding that if a meme is observed five times in our sample 
it is likely mentioned many more times in Twitter at large. 

The tracking of a new tweet consists of three steps: (i) removing 
tweets outside the current sliding activation window; (ii) extracting 
memes from the tweet and tracking their activation; and (iii) storing 
tweets related to any now activated memes. Because the tweets are 
presented in sorted order, and the number of memes in a tweet is 
bounded by the constant tweet length, step (i) can be completed 
in time linear in the number of old tweets, and steps (ii) and (iii) 
require constant time. 

Prior to settling on this detection strategy for topics of general 
interest, we experimented with a more complicated strategy based 
on examining the logarithmic derivative of the number of mentions 
of a particular meme, computed hourly. This approach was inspired 
by previous work on attention dynamics in Wikipedia (40). Since 
many memes with bursty behavior have low volume, we augmented 
the burst detection algorithm with a second predicate that included 
memes that appeared in a minimum percentage of the tweets over 
the past hour. We eventually discarded this hybrid detection mech- 
anism due to the complexity of choosing appropriate parameters, in 
favor of the simpler scheme described above. 

Our system has tracked a total of approximately 305 million 
tweets collected from September 14 until October 27, 2010. Of 
these, 1.2 million contain one or more of our political keywords; 
detection of interesting memes further reduced this set to 600,000 
tweets actually entered in our database for analysis. 

4.3 Network Analysis 

The Klatsch framework is responsible for network analysis and 
layout for visualization of the diffusion patterns on the Truthy Web 
site. It consists of several components, as depicted in Figure|5] The 
key components of the system are: a set of input adapters for im- 
porting external social network data into the Klatsch data model; 
support for a variety of standard graph layout and visualization 
algorithms; a flexible scripting language for coding site-agnostic 
analysis modules; and a set of export modules, including an em- 
bedded light-weight Web server, for visualizing analysis, saving 
statistical results, supporting interactive Web tools, and producing 
publication-ready graphs and tables. 

Klatsch includes an embedded domain- specific scripting language 
with advanced features such as first-order functions, streams, and 
map/filter/reduce primitives. For instance, the inclusion of streams 
as a first-order data type supports the lazy evaluation of algorithms 
that operate on the nodes and edges of large graphs. Our graph anal- 
ysis and visualization algorithms are implemented in the Klatsch 
language. As an example of its expressiveness, consider the fol- 



lowing problem: given a user and a meme, find out the average 
proportion of tweets about that meme that the user accounts for, 
among all tweets received by people he tweets to. (In other words, 
if Fred tweets a meme to Barney 3 times and Barney receives 6 
tweets about that meme in total, that's 0.5.) This complex calcula- 
tion can be performed by the following code snippet: 



<■ -¥ O O truthv.indiana.edu/ 
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f ind_meme_prop = proc (actor_id) 

g.eo (anode (actor_id) ) .map (proc (e) e.w() 
/ g . si (e . dst ())). list (). mean () ; 



To characterize the structure of the diffusion network we com- 
pute several statistics based on the topology of the largest con- 
nected component of the retweet / mention graph. These include 
the number of nodes and edges in the graph, the mean degree and 
strength of nodes in the graph, mean edge weight, clustering coef- 
ficient of the largest connected components and the standard devia- 
tion and skew of each network's in-degree, out-degree and strength 
distributions (cf. Figure [2). Additionally we track the out-degree 
and out- strength of the most prolific broadcaster, as well as the 
in-degree and in- strength of the most focused-upon user. We also 
monitor the number of unique injection points of the meme in the 
largest connected component, reasoning that organic memes (such 
as those relating to news events) will be associated with larger num- 
ber of originating users. 

4.4 Sentiment Analysis 

In addition to the graph-based statistics described above we uti- 
lize a modified version of the Google-based Profile of Mood States 
(GPOMS) sentiment analysis method introduced by Pepe and 
Bollen (37) in the analysis of meme- specific sentiment on Twitter. 
The GPOMS tool assigns to a body of text a six-dimensional vec- 
tor with bases corresponding to different mood attributes, namely 
Calm, Alert, Sure, Vital, Kind, and Happy. To produce scores for 
a meme along each of the six dimensions, GPOMS relies on a vo- 
cabulary taken from an established psychometric evaluation instru- 
ment known as POMS |34| . The original POMS test asks evaluees 
to rate their emotional state with respect to a vocabulary of 72 ad- 
jectives, and associates each adjective with corresponding mood 
dimensions. The strength of an evaluees identification with various 
adjectives contributes to summed scores along the mood dimen- 
sions. 

Users of social media, however, are notoriously resourceful in 
their ability to create new lexicons for expression. To address this 
issue, the GPOMS tool relies on the Google n-gram corpusQwhich 
identifies the co-occurrence rates of approximately 2.5 billion to- 
ken pairings observed in a corpus of approximately one trillion Web 
pages. By associating the original POMS terms with their Google 
5-gram partners of the form "I feel X and 7" where X is a POMS 
term, the GPOMS tool is able to expand its target lexicon to 964 
tokens, each of which can be transitively associated with an under- 
lying mood dimension. Consequently, observations of these tokens 
in a body of text contribute to the magnitude of a given mood di- 
mension proportionate to the co-occurrence rate of the term and 
each of the original 72 POMS adjectives. The resulting mood vec- 
tor is normalized to unit length, resulting in magnitudes ranging 
between —1 and 1 along each of the six dimensions. We applied 
the GPOMS methodology to the collection of tweets, obtaining a 
six-dimensional mood vector for each meme. 
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Figure 6: Screenshots of the Truthy Web site meme overview 
page (top) and meme detail page (bottom). 



4.5 Web Interface 

The final component of our analytical framework includes a dy- 
namic Web interface to allow users to inspect memes through var- 
ious views, and annotate those they consider to be truthy. Raw 
counts of these user annotations are used as input to the classifica- 
tion apparatus described in § [6] To facilitate the decision making 
process, we provide a mixed presentation of statistical information 
and interactive visualizations elements. Figure |6]pro vide snapshots 
of summary and detailed views available on the Truthy site. 

Users who wish to explore the Truthy database using the Web 
interface can sort memes according to a variety of ranking criteria, 
including the size of the largest connected component, number of 
user annotations, number of users, number of tweets, number of 
tweets per user, number of retweets, and number of meme injection 
points. This list-based presentation of memes functions as a con- 
cise, high-level view of the data, allowing users to examine related 
keywords, time of most recent activity, tweet volume sparklines 
and thumbnails of the information diffusion network. At this high 
level users can examine a large number of memes quickly and sub- 
sequently drill down into those that exhibit interesting behavior. 



Once a user has selected an individual meme for exploration, 
she is presented with a more detailed presentation of statistical data 
and interactive visualizations. Here the user can examine the sta- 
tistical data described above, tweets relating the meme of interest, 
and sentiment analysis data. Additionally users can explore the 
temporal data through an interactive annotated timeline, inspect a 
force-directed layout of the meme diffusion network, and view a 
map of the tweet geo-locations. Upon examining these features, 
the user is then able to make a decision as to whether this meme is 
truthy or not, and can indicate her conclusion by clicking a button 
at the top of the page. 

5. EXAMPLES OF TRUTHY MEMES 

The Truthy site allowed us to identify several truthy memes. 
Some of these cases caught the attention of the popular press due 
to the sensitivity of the topic in the run up to the political elections, 
and subsequently many of the accounts involved were suspended 
by Twitter. Below we illustrate a few representative examples. 

#ampat The #ampat hashtag is used by many conservative users 
on Twitter. What makes this meme suspicious is that the 
bursts of activity are driven by two accounts, @CSteven 
and @CStevenTucker, which are controlled by the same 
user, in an apparent effort to give the impression that more 
people are tweeting about the same topics. This user posts 
the same tweets using the two accounts and has generated a 
total of over 41, 000 tweets in this fashion. See Figure [7J A) 
for the diffusion network of this hashtag. 

@PeaceKaren_25 This account did not disclose information 
about the identity of its owner, and generated a very large 
number of tweets (over 10,000 in four months). Almost all of 
these tweets supported several Republican candidates. An- 
other account, @HopeMarie_2 5, had a similar behavior to 
@PeaceKaren_25 in retweeting the accounts of the same 
candidates and boosting the same Web sites. It did not pro- 
duce any original tweets, and in addition it retweeted all of 
@PeaceKaren_25's tweets, promoting that account. These 
accounts had also succeeded at creating a 'twitter bomb' : for 
a time, Google searches for "gop leader" returned these 
tweets in the first page of results. A visualization of the inter- 
action between these two accounts can be see in Figure[7jB). 
Both accounts were suspended by Twitter by the time of this 
writing. 

gopleader . gov This meme is the Web site of the Republican 
Leader John Boehner. It looks truthy because it is boosted 
by two suspicious accounts described above. The diffusion 
of this URL is shown in Figure |7|C). 

How Chris Coons budget works- uses tax $ 2 attend dinners 
and fashion shows 

This is one of a set of truthy memes smearing Chris Coons, 
the Democratic candidate for U.S. Senate from Delaware. 
Looking at the injection points of these memes, we uncov- 
ered a network of about ten bot accounts. They inject thou- 
sands of tweets with links to posts from the f reedomist . | 
com Web site. To avoid detection by Twitter and increase 
visibility to different users, duplicate tweets are disguised by 
adding different hashtag s and appending junk query parame- 
ters to the URLs. This works because many URL- shortening 
services ignore querystrings when processing redirect requests. 
To generate retweeting cascades, the bots also coordinate 
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Table 1: Features used in truthy classification. 



mentioning a few popular users. These targets get the ap- 
pearance of receiving the same news from several different 
people, and are more likely to think it is true, and spread 
it to their followers. Most of the bot accounts in this net- 
work can be traced back to a single person who runs the 
If reedomist . coml Web site. The diffusion network cor- 
responding to this case is illustrated in Figure [TjD). 

These are just a few instructive examples of characteristically 
truthy memes our system was able to identify. Two other net- 
works of bots were shut down by Twitter after being detected by 
Truthy. In one case we observed the automated accounts using 
text segments drawn from newswire services to produce multiple 
legitimate-looking tweets in between the injection of URLs. These 
instances highlight several of the more general properties of truthy 
memes detected by our system. 

Figure [7] also shows the diffusion networks for four legitimate 
memes. One, # Truthy, was injected as an experiment by the NPR 
Science Friday radio program. Another, @ sen johnmccain, dis- 
plays two different communities in which the meme was propa- 
gated: one by retweets from @ladygaga in the context of discus- 
sion on the repeal of the "Don't ask, don't tell" policy on gays in 
the military, and the other by mentions of @ sen johnmccain. A 
gallery with detailed explanations about various truthy and legiti- 
mate memes can be found on our Web siteQ 

6. TRUTHINESS CLASSIFICATION 

As an application of the analyses performed by the Truthy sys- 
tem, we trained a binary classifier to automatically label legitimate 
and truthy memes. 

We began by producing a hand-labeled corpus of training ex- 
amples in three classes — 'truthy,' 'legitimate,' and 'remove.' We 
labeled these by presenting random memes to several human re- 
viewers, and asking them to place each meme in one of the three 
categories. They were told to classify a meme as 'truthy' if a sig- 
nificant portion of the users involved in that meme appeared to be 
spreading it in misleading ways — e.g., if a number of the accounts 
tweeting about the meme appeared to be robots or sock puppets, the 
accounts appeared to follow only other propagators of the meme 
(clique behavior), or the users engaged in repeated reply/retweet 

1 truthy . indiana . edu/ gallery 




Figure 7: Diffusion networks of sample memes from our dataset. Edges are represented using the same notation as in Figure [2] 
Four truthy memes are shown in the top row and four legitimate ones in the bottom row. (A) #ampat (B) @PeaceKaren_25 
(C) gopleader . gov (D) "How Chris Coons budget works- uses tax $ 2 attend dinners and fashion shows" (E) #Truthy (F) 
@senjohnmccain (G) on.cnn.com/aVMu5y (H) "Obama said taxes have gone down during his administration. That's ONE 
way to get rid of income tax — getting rid of income" 
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exclusively with other users who had tweeted the meme. 'Legit- 
imate' memes were described as memes representing normal, le- 
gitimate use of Twitter — several non- automated users conversing 
about a topic. The final category, 'remove,' was to be used for those 
memes that were in a foreign language, or otherwise did not seem 
to be related to American politics (# youth, for example). These 
memes were not used in the training or evaluation of classifiers. 

After we had gathered several hundred annotations we observed 
an imbalance in our labeled data with less than 10% truthy. Rather 
than simply resampling, as is common practice in the case of class 
imbalance, we performed a second round of human annotations on 
previously-unlabeled memes predicted to be 'truthy' by the clas- 
sifier trained in the previous round. In this way we were able to 
manually label a larger portion of truthy memes. Our final train- 
ing dataset consisted of 366 training examples — 61 truthy memes 
and 305 legitimate ones. In those cases where multiple reviewers 
disagreed on the labeling of a meme, we determined the final label 
by a group discussion among all reviewers. The dataset is available 
online 

We used the WEKA machine learning package |22) for classi- 
fier training, providing each classification strategy with 32 features 
about each meme, as shown in Table [T] We experimented with two 
classifiers: AdaBoost with DecisionStump, and SVM. As the num- 
ber of instances of truthy memes was still less than instances of 
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legitimate ones, we also experimented with resampling the training 
data to balance the classes prior to classification. The performance 
of the classifiers is shown in Table[2] as evaluated by their accuracy 
and the area under their ROC curves. In all cases these prelimi- 
nary results are quite encouraging, with accuracy around or above 
90%. The best results are obtained by AdaBoost with resampling. 
Table |3] further shows the confusion matrices for AdaBoost. In this 
task, false negatives (truthy memes incorrectly classified as legiti- 
mate) are less desirable than false positives. In the worst case, the 
false negative rate is 5%. Table [4] shows the 10 most discriminative 
features, as determined by \ 2 analysis. Network features appear to 
be more discriminative than sentiment scores or the few user anno- 
tations that we collected. 

7. DISCUSSION 

We introduced a system for the real-time analysis of meme dif- 
fusion from microblog streams. The Klatsch framework will soon 
be released as open source. We described the Truthy system and 
Web site, which leverage this framework to track political memes 
in Twitter and help detect astroturfing campaigns in the context of 
U.S. political elections. 

Our simple classification system yielded promising results in ac- 
curately detecting truthy memes based on features extracted from 
the topology of the diffusion networks. Using this system we have 
been able to identify a number of genuinely truthy memes. Though 
few of these exhibit the explosive growth characteristic of true vi- 
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ral memes, they are nonetheless clear examples of coordinated at- 
tempts to deceive Twitter users. Truthy memes are often spread ini- 
tially by bots, causing them to exhibit pathological diffusion graphs 
relative to what is observed in the case of organic memes. These 
graphs can take many forms, including high numbers of unique 
injection points with few or no connected components, strong star- 
like topologies characterized by high average degree, and most tell- 
ingly large edge weights between dyads in graphs that exhibit either 
of the above properties. 

The accuracy scores we observe in the classification task are 
quite high. We hypothesize that this performance is partially ex- 
plained by the fact that a consistent proportion of the memes were 
failed attempts of starting a cascade. In these cases the networks 
reduced to isolated injection points or small components, resulting 
in trivial network features that allowed for easy classification. 

Despite the fact that many of the memes discussed in this pa- 
per are characterized by small diffusion networks, it is important to 
note that this is the stage at which such attempts at deception must 
be identified. Once one of these attempts is successful at gaining 
the attention of the community, it will quickly become indistin- 
guishable from an organic meme. Therefore, the early identifica- 
tion and termination of accounts associated with astroturf memes 
is critical. 

In the future we intend to add more views to the website, includ- 
ing views on the users, such as the ages of the accounts, and tag 
clouds to interpret the sentiment analysis scores. We need to col- 
lect more labeled data about truthy memes in order to achieve more 
meaningful classification results, and will also explore the use of 
additional features in the classifiers, such as account ages for the 
most active users in a meme, and reputation features for users based 
on the memes to which they contribute. Another important area to 
address is that of sampling bias, since the properties of the sample 
made available in the Twitter gardenhose are currently unknown. 
To explore this, we intend to track injected memes of various sizes 
and with different topological properties of their diffusion graphs. 
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